Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Many remarkable phenotypes have repeatedly occurred across vast evolutionary distances. When convergent traits emerge on the tree of life, they are sometimes driven by the same underlying gene families, while other times, many different gene families are involved. Conversely, a gene family may be repeatedly recruited for a single trait or many different traits. To understand the general rules governing convergence at both genomic and phenotypic levels, we systematically tested associations between 56 binary metabolic traits and gene count in 14,785 gene families from 993 Saccharomycotina yeasts. Using a recently developed phylogenetic approach that reduces spurious correlations, we found that gene family expansion and contraction were significantly linked to trait gain and loss in 45/56 (80%) traits. While 595/739 (81%) significant gene families were associated with only one trait, we also identified several “keystone” gene families that were significantly associated with up to 13/56 (23%) of all traits. Strikingly, most of these families are known to encode metabolic enzymes and transporters, including all members of the industrially relevantMALtose fermentation loci in the baker’s yeastSaccharomyces cerevisiae. These results indicate that convergent evolution on the gene family level may be more widespread across deeper timescales than previously believed.more » « lessFree, publicly-accessible full text available June 10, 2026
-
Abstract Multiple sequence alignments and phylogenetic trees are rich in biological information and are fundamental to research in biology. PhyKIT is a tool for processing and analyzing the information content of multiple sequence alignments and phylogenetic trees. Here, we describe how to use PhyKIT for diverse analyses, including (i) constructing a phylogenomic supermatrix, (ii) detecting errors in orthology inference, (iii) quantifying biases in phylogenomic data sets, (iv) identifying radiation events or lack of resolution using gene support frequencies, and (v) conducting evolution‐based screens to facilitate gene function prediction. Several PhyKIT functions that streamline multiple sequence alignment and phylogenetic processing—such as renaming FASTA entries or tree tips—are also discussed. These protocols demonstrate how simple command‐line operations in the unified framework of PhyKIT facilitate diverse phylogenomic data analysis and processing, from supermatrix construction and diagnosis to gaining clues about gene function. © 2024 The Author(s). Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Installing PhyKIT and syntax for usage Basic Protocol 2: Constructing a phylogenomic supermatrix Basic Protocol 3: Detecting anomalies in orthology relationships Basic Protocol 4: Quantifying biases in phylogenomic data matrices and related measures Basic Protocol 5: Identifying polytomies Basic Protocol 6: Assessing gene‐gene coevolution as a genetic screenmore » « lessFree, publicly-accessible full text available November 1, 2025
-
ABSTRACT Yeasts in the subphylum Saccharomycotina are found across the globe in disparate ecosystems. A major aim of yeast research is to understand the diversity and evolution of ecological traits, such as carbon metabolic breadth, insect association, and cactophily. This includes studying aspects of ecological traits like genetic architecture or association with other phenotypic traits. Genomic resources in the Saccharomycotina have grown rapidly. Ecological data, however, are still limited for many species, especially those only known from species descriptions where usually only a limited number of strains are studied. Moreover, ecological information is recorded in natural language format limiting high throughput computational analysis. To address these limitations, we developed an ontological framework for the analysis of yeast ecology. A total of 1,088 yeast strains were added to the Ontology of Yeast Environments (OYE) and analyzed in a machine‐learning framework to connect genotype to ecology. This framework is flexible and can be extended to additional isolates, species, or environmental sequencing data. Widespread adoption of OYE would greatly aid the study of macroecology in the Saccharomycotina subphylum.more » « less
-
Kamoun, Sophien (Ed.)Many distantly related organisms have convergently evolved traits and lifestyles that enable them to live in similar ecological environments. However, the extent of phenotypic convergence evolving through the same or distinct genetic trajectories remains an open question. Here, we leverage a comprehensive dataset of genomic and phenotypic data from 1,049 yeast species in the subphylum Saccharomycotina (Kingdom Fungi, Phylum Ascomycota) to explore signatures of convergent evolution in cactophilic yeasts, ecological specialists associated with cacti. We inferred that the ecological association of yeasts with cacti arose independently approximately 17 times. Using a machine learning–based approach, we further found that cactophily can be predicted with 76% accuracy from both functional genomic and phenotypic data. The most informative feature for predicting cactophily was thermotolerance, which we found to be likely associated with altered evolutionary rates of genes impacting the cell envelope in several cactophilic lineages. We also identified horizontal gene transfer and duplication events of plant cell wall–degrading enzymes in distantly related cactophilic clades, suggesting that putatively adaptive traits evolved independently through disparate molecular mechanisms. Notably, we found that multiple cactophilic species and their close relatives have been reported as emerging human opportunistic pathogens, suggesting that the cactophilic lifestyle—and perhaps more generally lifestyles favoring thermotolerance—might preadapt yeasts to cause human disease. This work underscores the potential of a multifaceted approach involving high-throughput genomic and phenotypic data to shed light onto ecological adaptation and highlights how convergent evolution to wild environments could facilitate the transition to human pathogenicity.more » « less
-
Abstract Gene gains and losses are a major driver of genome evolution; their precise characterization can provide insights into the origin and diversification of major lineages. Here, we examined gene family evolution of 1154 genomes from nearly all known species in the medically and technologically important yeast subphylum Saccharomycotina. We found that yeast gene family evolution differs from that of plants, animals, and filamentous ascomycetes, and is characterized by smaller overall gene numbers yet larger gene family sizes for a given gene number. Faster-evolving lineages (FELs) in yeasts experienced significantly higher rates of gene losses—commensurate with a narrowing of metabolic niche breadth—but higher speciation rates than their slower-evolving sister lineages (SELs). Gene families most often lost are those involved in mRNA splicing, carbohydrate metabolism, and cell division and are likely associated with intron loss, metabolic breadth, and non-canonical cell cycle processes. Our results highlight the significant role of gene family contractions in the evolution of yeast metabolism, genome function, and speciation, and suggest that gene family evolutionary trajectories have differed markedly across major eukaryotic lineages.more » « less
-
Abstract Codon usage bias, or the unequal use of synonymous codons, is observed across genes, genomes, and between species. It has been implicated in many cellular functions, such as translation dynamics and transcript stability, but can also be shaped by neutral forces. We characterized codon usage across 1,154 strains from 1,051 species from the fungal subphylum Saccharomycotina to gain insight into the biases, molecular mechanisms, evolution, and genomic features contributing to codon usage patterns. We found a general preference for A/T-ending codons and correlations between codon usage bias, GC content, and tRNA-ome size. Codon usage bias is distinct between the 12 orders to such a degree that yeasts can be classified with an accuracy >90% using a machine learning algorithm. We also characterized the degree to which codon usage bias is impacted by translational selection. We found it was influenced by a combination of features, including the number of coding sequences, BUSCO count, and genome length. Our analysis also revealed an extreme bias in codon usage in the Saccharomycodales associated with a lack of predicted arginine tRNAs that decode CGN codons, leaving only the AGN codons to encode arginine. Analysis of Saccharomycodales gene expression, tRNA sequences, and codon evolution suggests that avoidance of the CGN codons is associated with a decline in arginine tRNA function. Consistent with previous findings, codon usage bias within the Saccharomycotina is shaped by genomic features and GC bias. However, we find cases of extreme codon usage preference and avoidance along yeast lineages, suggesting additional forces may be shaping the evolution of specific codons.more » « less
-
Genome-scale amounts of data and the development of novel statistical phylogenetic 18 approaches have greatly aided the reconstruction of a broad sketch of the tree of life and resolved 19 many of its branches. However, incongruence—the inference of conflicting evolutionary histories—20 remains pervasive in phylogenomic data. We synthesize the biological and analytical factors that 21 drive incongruence, discuss methodological advances to diagnose and handle incongruence, and 22 identify avenues for future research. The study of incongruence has enabled a deeper understanding 23 of phylogenesis and improved our ability to reconstruct and interpret the tree of life.more » « less
-
How genomic differences contribute to phenotypic differences is a major question in biology. The recently characterized genomes, isolation environments, and qualitative patterns of growth on 122 sources and conditions of 1,154 strains from 1,049 fungal species (nearly all known) in the yeast subphylum Saccharomycotina provide a powerful, yet complex, dataset for addressing this question. We used a random forest algorithm trained on these genomic, metabolic, and environmental data to predict growth on several carbon sources with high accuracy. Known structural genes involved in assimilation of these sources and presence/absence patterns of growth in other sources were important features contributing to prediction accuracy. By further examining growth on galactose, we found that it can be predicted with high accuracy from either genomic (92.2%) or growth data (82.6%) but not from isolation environment data (65.6%). Prediction accuracy was even higher (93.3%) when we combined genomic and growth data. After theGALactose utilization genes, the most important feature for predicting growth on galactose was growth on galactitol, raising the hypothesis that several species in two orders, Serinales and Pichiales (containing the emerging pathogenCandida aurisand the genusOgataea, respectively), have an alternative galactose utilization pathway because they lack theGALgenes. Growth and biochemical assays confirmed that several of these species utilize galactose through an alternative oxidoreductive D-galactose pathway, rather than the canonicalGALpathway. Machine learning approaches are powerful for investigating the evolution of the yeast genotype–phenotype map, and their application will uncover novel biology, even in well-studied traits.more » « less
-
Abstract Maximum likelihood (ML) phylogenetic inference is widely used in phylogenomics. As heuristic searches most likely find suboptimal trees, it is recommended to conduct multiple (e.g., 10) tree searches in phylogenetic analyses. However, beyond its positive role, how and to what extent multiple tree searches aid ML phylogenetic inference remains poorly explored. Here, we found that a random starting tree was not as effective as the BioNJ and parsimony starting trees in inferring the ML gene tree and that RAxML-NG and PhyML were less sensitive to different starting trees than IQ-TREE. We then examined the effect of the number of tree searches on ML tree inference with IQ-TREE and RAxML-NG, by running 100 tree searches on 19,414 gene alignments from 15 animal, plant, and fungal phylogenomic datasets. We found that the number of tree searches substantially impacted the recovery of the best-of-100 ML gene tree topology among 100 searches for a given ML program. In addition, all of the concatenation-based trees were topologically identical if the number of tree searches was ≥10. Quartet-based ASTRAL trees inferred from 1 to 80 tree searches differed topologically from those inferred from 100 tree searches for 6/15 phylogenomic datasets. Finally, our simulations showed that gene alignments with lower difficulty scores had a higher chance of finding the best-of-100 gene tree topology and were more likely to yield the correct trees.more » « less
-
Townsend, Jeffrey (Ed.)Abstract Siderophores are crucial for iron-scavenging in microorganisms. While many yeasts can uptake siderophores produced by other organisms, they are typically unable to synthesize siderophores themselves. In contrast, Wickerhamiella/Starmerella (W/S) clade yeasts gained the capacity to make the siderophore enterobactin following the remarkable horizontal acquisition of a bacterial operon enabling enterobactin synthesis. Yet, how these yeasts absorb the iron bound by enterobactin remains unresolved. Here, we demonstrate that Enb1 is the key enterobactin importer in the W/S-clade species Starmerella bombicola. Through phylogenomic analyses, we show that ENB1 is present in all W/S clade yeast species that retained the enterobactin biosynthetic genes. Conversely, it is absent in species that lost the ent genes, except for Starmerella stellata, making this species the only cheater in the W/S clade that can utilize enterobactin without producing it. Through phylogenetic analyses, we infer that ENB1 is a fungal gene that likely existed in the W/S clade prior to the acquisition of the ent genes and subsequently experienced multiple gene losses and duplications. Through phylogenetic topology tests, we show that ENB1 likely underwent horizontal gene transfer from an ancient W/S clade yeast to the order Saccharomycetales, which includes the model yeast Saccharomyces cerevisiae, followed by extensive secondary losses. Taken together, these results suggest that the fungal ENB1 and bacterial ent genes were cooperatively integrated into a functional unit within the W/S clade that enabled adaptation to iron-limited environments. This integrated fungal-bacterial circuit and its dynamic evolution determine the extant distribution of yeast enterobactin producers and cheaters.more » « less
An official website of the United States government
